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ABSTRACT 

A "forward  move  algorithm"  and  some  of  its  formal  properties  are  pre- 
sented for  use  in  a practical  syntactic  error  recovery  scheme  for  LR  parsers. 
The  algorithm  finds  a "valid  fragment"  (comparable  to  a valid  prefix)  just 
to  the  right  of  a point  of  error  detection.  For  expositional  purposes  the 
algorithm  is  presented  as  parsing  arbitrarily  far  beyond  the  point  of  error 
detection  in  a "parallel"  mode,  as  long  as  all  parses  agree  on  the  read  or 
reduce  action  to  be  taken  at  each  parse  step.  In  practice  the  forward  move 
is  achieved  serially  by  adding  "recovery  states"  to  the  LR  machine.  Based 
on  the  formal  properties  of  the  forward  move  we  propose  a practical  error 
recovery  algorithm  that  uses  the  "right  context"  accumulated  by  the  forward 
move.  The  performamce  of  the  recovery  algorithm  is  illustrated  in  a specific 
case  and  discussed  in  general. 

Key  words  ^md  phrases:  syntax  errors,  error  recovery,  parsing,  LR(k} , 
SLR(k) , LALR(k) . 

CR  categories:  4.12,  4.42,  5.23. 
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0.  INTRODUCTION 

Over  the  past  twenty  years  much  effort  has  been  invested  into  the  sci- 
ence of  deterministic  parsing;  that  is,  determining  the  phrase  structure 
of  a sentence  generated  by  a context-free  grcunmar  [H&U  69]  during  a single 
seem,  usually  from  left  to  right.  The  two  pinnacles  of  this  research  are 
the  LLOc)  and  LR(k)  gr2unmars  and  their  parsers,  respectively  top-down  and 
bottom-up  techniques  [A&U  72]. 

Unfortunately,  the  more  adept  parsing  techniques  have  gotten,  the  more 
difficult  it  has  seemed  to  achieve  flexible  error  recovery.  It  seems  that 
the  more  the  parser  knows  cdx>ut  the  input  possibilities  and  specializes 
itself  via  state  transitions  to  restricted  parts  of  itself,  the  more  diffi- 
cult it  is  for  it,  in  the  face  of  a detected  error,  to  back  out  and  get 
global  information  necessary  for  good  error  recovery.  In  the  words  of 
Graham  emd  Rhodes  [G&R  75] ; "The  fact  that  the  next  move  of  the  parser  can 
depend  on  the  entire  correct  prefix  already  analyzed  makes  it  difficult  or 
in^ssible  to  start  up  the  parser  after  the  error  [detection]  point." 

This  paper  is  a contribution  toward  giving  LR  parsers  some  such  global 
capabilities.  Indeed,  we  show  that  it  is  easy  to  extend  them  to  start  up 
after  eui  error  is  detected  and  to  parse  arbitrarily  far  ahead,  gathering 
right  context.  This  context  can  then  be  used  to  guide  the  selection  and 
I evaluation  of  repair  attempts.  Thus,  we  decompose  the  notion  of  error  re- 

covery into  (1)  gathering  right  context  and  (2)  a repair  strategy. 

History.  Graham  and  Rhodes  [G&R  75]  proposed  an  error  recovery  scheme 
for  deterministic  bottom-up  parsers  that  involves  "condensing"  context  about 
the  point  at  which  an  error  was  detected.  A "backward  move"  condenses  con- 
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text  to  the  left  and  a "forward  move"  gathers  context  to  the  right.  Such 
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context  is  valuable  input  to  em  error  repair  strategy.  Graham  and  Rhodes 
show  how  the  condensation  is  done  for  simple  precedence  parsers  and  give 
an  error  repair  strategy  that  uses  the  condensed  context. 

We  have  adapted  the  general  idea  of  Graham  and  Rhodes  to  LR  parsers 
lAfiJ  741,  by  which  we  mean  LR(k)  parsers  and  all  their  variants:  LALR{k) , 
SLR(k) , etc.  Some  of  the  investigation  has  already  been  reported  in  Pennel- 
lo's  Master's  Thesis  [Pen  77]  (see  also  (O'H  761).  The  present  paper  refines 
the  theoretical  results  developed  in  [Pen  771  and  adds  some  algorithms  and 
some  empirical  results  from  a recent  implementation. 

Briefly,  we  found  the  "backward  move"  to  be  detrimental  in  enough  cases 
that  we  abandoned  it,  in  favor  of  a philosophy  of  trying  to  do  only  what  is 
consistent  with  every  context  for  as  long  as  possible,  resorting  to  guesses 
only  when  we  know  of  no  other  way  to  proceed.  Pennello  developed  a "paral- 
lel parse"  exposition  of  the  "forward  move"  for  LR  parsers  that  facilitates 
\inderstanding  ^md  proof  of  results,  and  we  show  how  to  "serialize"  it  so 
that  in  practice  the  parser  simply  has  some  extra  "recovery  states"  that  work 
just  as  the  other  states  do,  but  are  entered  only  in  recovery  mode.  Several 
theorems  were  developed,  primarily  relating  to  the  "derived  valid  fragment" 
accumulated  during  the  forward  move.  Druseikis  and  Ripley  [D&R  77]  have 
reported  some  similar  results  which  we  note  as  the  issues  arise. 

Preview.  We  first  present  the  forward  move  algorithm  (FMA)  in  its  par- 
allel form.  Then  we  state  several  properties  of  FMA  that  are  relevant  to 
error  repair.  These  properties  suggest  ways  that  the  "forward  context"  may 
be  used  in  a repair  strategy.  Perhaps  the  most  important  of  these  indicates 
that  the  forward  context  may  be  used  to  efficiently  verify  that  a repair 
atten^t  is  "consistent"  with  the  input  text  parsed  by  FMA. 
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Next  we  present  several  algorithms  that  are  used  in  the  repair  strategy. 
The  analysis  there  starts  with  the  simplifying  assumption  of  but  a single 
error  of  Insertion,  replacement,  or  deletion  of  a single  terminal  symbol. 

The  effect  of  delayed  versus  Immediate  detection  Is  discussed,  and  then  mul- 
tiple errors  are  treated.  For  clarity  and  simplicity  the  algorithms  are 
presented  without  regard  to  certain  practicalities,  which  are  then  discussed 
in  the  text. 

Finally,  we  show  how  to  serialize  FMA  for  a practical  implementation, 
we  present  some  statistics  on  how  m2my  extra  states  are  needed  for  some  well 
known  prograimning  languages.  Then  we  demonstrate  how  our  error  recovery 
performed  on  the  example  Algol  program  of  Graham  and  Rhodes  [G&R  75]. 
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1 . TERMINOLOGY 

We  review  basic  notation  eind  terminology  for  strings,  grammars,  ^uld 
parsers.  A vocctbulary  (or  alphabet)  V is  a finite  set  of  symbols.  V*  de- 
notes the  set  of  all  strings  of  symbols  from  V.  denotes  V*  less  A,  the 

empty  string.  If  x is  a nonempty  string.  First  x denotes  the  first  symbol 
of  X 2md  Rest  x denotes  x stripped  of  its  first  symbol.  (We  typically  do 
not  put  parentheses  around  arguments  to  functions  when  the  meaning  is  clear, 
as  above . ) 

A context-free  grcunmar  G is  a quadruple  (N,T,S,P) , i.e.  the  nonterminals, 
terminals,  start  symbol,  euid  productions , respectively;  we  define  V = 

N U T.  Each  production  is  a pair  (A,w) , left  part  and  right  part,  written 
A -►  w,  where  A e N and  w e V*.  -►is  the  rightmost  derivation  relation,  in 
vdiich  the  rightmost  nonterminal  is  replaced  at  each  step;  -►^  is  its  transi- 
tive closure  and  -►*  is  its  transitive-reflexive  closure.  We  assume  a pro- 
duction S -►  S'J_  e P,  where  S'  e N,  J_  e T,  and  neither  S nor  J_  appear  in 
emy  other  production.  The  language  generated  by  G is  L(G)  = {w  e T*  | S -►^  w}. 

An  LR  parser,  i.e.  em  LR(1) , LALR(l) , SLR(l) , or  other  such  parser 
[A&J  74),  for  G=(N,T,S,P)  is  a sextuple  (K,V,P, START, SIGMA, REDUCE)  where  K 
is  a finite  set  of  states,  START  e K is  the  start  state,  SIGMA  is  the  transi- 

p 

tion  function  mapping  K x V into  K,  and  REDUCE  maps  K x V into  2 . If 

SIGMA(q,h)  = p,  we  also  write  this  as  the  transition  q — p.  From 

SIGMA  and  REDUCE  we  derive  the  parser  decision  function  PD,  mapping  K x V 
M 

into  2 , where  M = {read,  accept}  U P;  PD  indicates,  for  a given  state  and 
input  symbol,  all  possible  actions  the  parser  may  ta)ce;  if  the  grammar  G is 
LR,  PD  always  yields  at  most  a singleton  set.  PD  is  defined  as  PD(q,h)  = 

(read  | q — — > q'  for  some  q'  e K}  U (accept  | h = and  q — — > q'  for 
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some  q'  e k}  U REDUCE(q,h). 

In  figures  we  represent  LR  parsers  as  state  diagrams  in  which  states 
are  connected  by  arcs  labelled  with  elements  of  V,  according  to  SIGMA.  For 
each  state  q in  which  REDUCE  indicates  a possible  choice  of  a reduction  by 
production  p,  we  list  p and  its  1-symbol  look-ahead  set,  {h  e V | p e RE- 
DUCE(q,h)}.  Figure  1 depicts  a state  diagram  for  the  LALR(l)  parser  for 
a common  arithmetic  expression  grammar;  in  this  figure,  for  example, 

PD(**o*i)  = {read},  SIGMA{**o.i)  = io  PD(io,+)  = (P  i). 

Note  that  REDUCE  and  PD  may  take  a nonterminal  as  a second  argument. 

LR  parser  constructor  algorithms  easily  generalize  to  include  nonterminals 
in  look-ahead  sets.  We  assume  their  inclusion,  but  also  give  alternate 
means  of  implementing  the  results  of  this  paper  should  their  inclusion  in 
some  implementation  be  too  difficult.  Also  note  that  in  Figure  1,  there 
happen  to  be  no  nonterminals  in  look-ahead  sets  due  to  the  nature  of  the 
grammar . 

A path  P in  an  LR  parser  is  a sequence  of  states  qo»  qi?  •••» 
that  qo  — qi,  qi  — q2,  •••»  qn-j  — ^ <30'  define  To£  P = q^. 
We  say  that  P spells  w = wj  W2  ...  Wj^;  we  define  Spelling  P = w.  An  alter- 
nate notation  for  P is  [qgtw],  given  the  parser  or  its  state  diagram.  We 
e±)breviate  [STARTtw]  by  [wj ; thus  []  denotes  START  alone.  We  say  that  w 
accesses  q iff  Top  [w]  = q.  The  concatenation  of  two  paths  [q:y]  and  lq':y'l 
where  Top  [q:y]  = q'  is  written  lq:y]tq':y']  and  denotes  lq:yy'l.  If.  for 
some  q,  q’ , 2uid  h,  SIGMA{q' ,h)  = q,  then  Access ing_symbol  q = h (the  access- 
ing symbol  for  each  state  is  unique,  except  that  START  has  no  accessing 
symbol) . 

An  LR  parser  configuration  is  a pair  (Z,R)  where  Z is  a path  and  R e T . 


arithmetic  expression  grammar  and  its  LALRCl)  parser 
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For  example,  when  the  first  three  symbols  of  the  sentence  have  been 

reduced  to  E+T  by  the  parser  of  Figure  1,  it  is  in  the  configuration  ( [E+T] , 
+ii)  = ( (START, Eo.+q, To ) , +ii)  . 


The  parser  moves  from  one  configuration  to  the  next  by  reading  or  reduc- 
ing; thus  we  define  a move  as  an  element  of  the  set  {read}  U P.  |—  denotes 
a move  from  one  configuration  to  another;  |—  is  its  transitive  closure  and 
j—  is  its  transitive-reflexive  closure.  We  sometimes  use  |—  . . . |—  for  . 

If  C |—  C by  move  M,  we  sometimes  write  C |—  C.  We  define  |—  as  follows: 
Given  some  ((q:y],R),  consider  the  possible  values  of  M = PD(Top [q:y] , First  R) 
case  {read}:  Let  h = First  R.  Then 

([q:y].R)  l^ead  ( 'R®st  R) . 
case  {a  -►  w}:  If  [q:y]  = [q:y'w]  for  some  y',  then 

([q:yl,R)  1^^  ([q:y’A],R). 

case  {accept}  or  {}  or  [m|  > 1:  There  is  no  C such  that 
([q:yl»R)  |—  C.  (I.e.  the  parser  makes  no  move  but 
instead  halts,  accepting  or  rejecting  the  input,  or 
unable  to  proceed  deterministically.) 

The  language  recognized  by  an  LR  parser  is  {w  e T"*"  | ([],w)  |—  accept} 

where  we  abbreviate  ([S by  accept . (Note  that  PD(Top[S' ] ,J_)  = 

{accept}.)  The  usual  LR  parsing  algorithm  is  easily  deduced  from  the  |— 
relation. 

A final  note:  We  present  algorithms  that  may  return  results  in  two 
different  ways:  via  a return  value  and/or  via  so-called  "result"  parameters. 
E.g. , the  phrase  "if  F(x,y,z)  gives  A,  B ..."  tests  the  boolean  return  value 
of  F which  is  called  with  the  three  input  expressions  (actual  parameters)  x, 
y,  2md  z,  and  with  the  two  result  parameters,  A and  B;  some  side-effect  will 
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happen  to  A ^md  B according  to  the  definition  of  F.  Returning  from  such 
a function  is  indicated  by  a statement  such  as  "return  True  giving  u,  v." 


2.  FORWARD  MOVE  AUMRITHM 


When  an  error  is  detected,  we  wish  to  perform  a "forward  move"  that 
parses  the  input  after  the  point  of  the  error  detection.  The  parse  cannot 
depend  upon  the  left  context  already  developed  on  the  stack  to  proceed, 
since  it  is  precisely  that  left  context  that  causes  the  parser  to  detect 
the  error.  Thus,  we  devise  an  algorithm  that  parses  ahead  starting  with 
no  left  context.  Our  formulation  of  the  "forward  move  algorithm"  keeps 
parsing  input  text  until  it  must  refer  to  the  "missing"  left  context  to  pro- 
ceed. At  that  point  it  halts,  and  we  use  the  developed  "forward  context" 
in  iui  error  repair  strategy. 

Consider  an  Algol-like  language  in  which  the  symbol  "do"  can  appear  in  a 
"for"  or  "while"  construct.  Suppose  the  two  productions  involving  these 
constructs  are 

Stmt  for  Id  :=  Exp  step  Exp 

until  Exp  do  Stmt  (1) 

Stmt  -►  while  Exp  do  Stmt  (2) 

where  we  capitalized  nonterminals  and  left  terminals  uncapitalized.  Now 
consider  the  erroneous  phrase 

for  X :=  1 step  1 until  do  begin  J :=  X end 
where  we  have  omitted  the  third  Exp  in  the  "for"  construct.  The  forward 
move,  if  started  with  its  input  head  at  the  symbol  "do,"  reduces  "do  begin 
J :■  X end"  to  "do  Stmt."  It  goes  no  further  than  this  because  it  needs 
to  know  left  context  to  determine  whether  at  this  point  it  must  reduce  bv 
production  (1)  or  (2)  (each  of  which  end  in  "do  Stmt,"  not  coincidentally). 

The  reason  the  forward  move  cam  parse  this  much  of  the  input  is  because 
in  both  places  that  "do"  appears  in  the  grammar,  it  is  followed  immediately 
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by  Stmt.  Thus,  context  to  the  left  of  the  "do"  is  not  necessary  to  reduce 
"do  begin  J ;■  X end"  to  "do  Stmt."  This  situation  occurs  often  enough  in 
progr^u^^ling  languages  that  it  is  not  uncommon  for  the  forward  move  to  make 
quite  some  progress  in  the  input  text  before  it  needs  to  refer  to  left 
context. 

The  essential  idea  of  our  algorithm  is  to  carry  out  all  possible  parses 
of  the  input  text,  as  long  as  all  parses  agree  as  to  the  next  move  to  make 
(i.e.  they  must  all  manipulate  the  stack  in  the  same  way  at  each  parse  step) 
and  no  parse  refers  to  nonexistent  left  context.  We  present  the  algorithm 
first  as  having  a stack  upon  which  we  push  sets  of  states  rather  than  states; 
these  sets  of  states  keep  track  of  the  parallel  parses.  At  each  step  of  the 
forward  move  we  inquire  of  each  state  in  the  set  on  the  top  of  the  stack 
what  its  decision  is  with  regard  to  the  next  symbol  in  the  input.  If  all 
the  states  in  that  top  state  set  that  accept  the  next  input  symbol  agree  as 
to  the  next  move,  and  this  next  move  does  not  refer  to  nonexistent  left  con- 
text, then  we  make  that  next  move.  For  example,  in  the  case  of  the  "read" 
move,  we  push  on  the  stack  the  set  of  all  the  states  that  can  be  reached 
from  any  state  in  the  top  state  set  by  taking  a transition  on  the  next  input 
symbol. 

Of  course,  manipulating  sets  of  states  is  not  practical,  but  we  show 
how  the  forward  move  algorithm  can  be  easily  converted  into  an  algorithm 
that  manipulates  states  only,  essentially  like  the  conversion  of  a nondeter- 
ministic  finite  state  machine  to  a deterministic  one.  The  converted  algorithm 
is  as  fast  as  the  LR  parsing  algorithm. 

Let  ? be  the  set  K of  all  the  parser  states.  We  now  present  the  forward 
move  algorithm.  The  algorithm  has  an  initialization  step  that  causes  it  to 
consume  at  least  one  symbol  of  the  input,  followed  by  repeated  parse  steps. 
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algorithm  Forward  move  (FMA) 
input  R - the  remaining  input 
output  the  forward  context  developed 
and  the  input  not  consumed 
Init; 

let  Z be  the  stack  consisting  only  of  ? 

Push  {q'  I q q',  q e ?}  on  Z 

R ■*-  Rest  R 
repeat 

let  h = First  R,  Q = Top  Z, 
and  MOVES  = ^ PD(q,h) 

q V Q 

select  MOVES; 
case  {read} : 

Push  {q'  I q — q'  and  q e Q}  on  Z 
R ■*-  Rest  R 

case  {a  -*■  w} ; # Reduce,  if  possible: 

^ |z|  > |w|  then 

Pop  [w]  state  sets  off  of  Z 
I A 

Push  (q*  I q > q'  and  q e Top  Z}  on  Z 

else  return  (Spelling  Z,R)  Fi 

# w does  not  reside  on  the  stack 
case  { } # We  hit  an  error 

or  (accept)  # R is  J_ 
or  otherwise:  # |pd(  > 1 
return  (Spelling  Z,R) 
end  repeat 
end  FMA 

Notice  that  the  repeated  parse  steps  of  FMA  are  identical  to  those  that 
the  parser  normally  follows,  save  the  "otherwise"  case,  the  manipulation  of 
sets  of  states  instead  of  states,  and  the  check,  just  prior  to  a reduction, 
that  the  entire  right  part  resides  on  the  stack. 

FMA  essentially  follows  all  paths  that  allow  the  parsing  of  the  input 
text.  It  halts  in  case  "otherwise"  when  two  different  paths  end  up  in  states 
that  disagree  as  to  how  to  continue  the  parse,  or  in  case  (A  + w)  when  all 
paths  end  up  in  states  requiring  a reduction  over  the  ?,  or  in  case  (accept) 
when  we  read  the  entire  input,  or  in  case  ()  when  we  encounter  another  error, 
i.e.  no  path  can  be  continued.  The  set  MOVES  computed  by  FMA  represents  all 
the  possible  ways  that  the  states  in  the  top  state  set  Q wish  to  treat  the 
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input  symbol  h.  Note  that  states  q e Q that  cannot  accept  h (i.e.  for 
which  PD(q,h)  »•  {})  have  no  effect  on  the  parsing  decision  unless  all  states 

in  Q c^urlnot  accept  h (case  O);  we  extend  each  path  as  far  as  we  can,  even 

though  other  paths  terminate. 

We  illustrate  the  halts  of  cases  {a  -►  w}  and  "otherwise"  by  Examples 
1 and  2 below,  where  the  parser  of  concern  is  that  of  Figure  1. 

Exang)le  1.  Let  the  erroneous  input  string  be  i(i)J^.  The  parser  stops 

with  state  stack  [ij . The  following  displays  the  execution  of  FMA  on  the 

remainder  of  the  input. 

FMA  step  Stack  after  Rest  of 

just  made  FMA  step  input 


Init 

? 

{(o) 

i ) 1 

{read} 

? 

{(o) 

{io} 

) 1 

{P  -*•  i) 

? 

{(o) 

{Pq} 

) i 

(T  -*■  P} 

? 

{(q} 

{Tq} 

) i 

(e  t) 

? 

{(o) 

(El  } 

) 1 

(read) 

? 

{{q} 

{El)  Oo) 

i 

{P  -*•  (E) } 

? 

{Po> 

1 

{T  -f  P} 

? 

{To.l 

fi,T2} 

1 

f The  algorithm  halts  here  because  PD(To,J_)  U PDCTj.J^)  U PD(T2.j_)  “ 

[ 

I {E  > E + T,  T -*•  P **  T,  E t} . Of  course,  the  expression  between  the  par- 

I entheses  could  have  been  arbitrarily  long  with  the  same  result. 

I • 

Example  2.  Input  is  ()j^.  The  parser  halts  with  state  stack  [(]. 

FMA  step  Stack  Rest 


Init  ? {)0>  1 

Halt:  PD()o»i.)  "•  {P  (E)},  and  there  are  less  than  three  items  on  the  stack 


above  the  ?. 
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In  Example  1,  we  face  the  possibilities  of  reducing  by  three  different 
productions.  E -►  T is  the  proper  reduction  only  if  what  immediately  precedes 
the  T is  a "("  or  nothing;  E -►  E+T  is  the  proper  reduction  only  if  what  im- 
mediately precedes  the  T is  "E+";  and  T ->•  P**T  is  correct  only  if  "p**"  pre- 
cedes the  T;  but  no  context  exists  to  the  left  of  {To,Ti,T2}.  Thus,  we 
cannot  continue  parsing  without  making  a guess,  and  must  halt.  In  effect, 
the  three  different  situations  in  the  parser  in  which  it  can  read  a T yield 
three  different  decisions  as  to  what  to  do  with  the  T. 

In  Example  2,  we  attempt  to  reduce  with  P ->■  (E)  , but  find  that  " (E” 
does  not  precede  ")"  on  the  stack.  The  attempted  reduction  gives  us  an  in- 
dication of  what  the  user  intended,  however,  and  may  provide  useful  informa- 
tion for  an  error  repair  strategy  called  "stack  forcing,"  as  we  explain  in 
the  next  section. 

The  initialization  step  Init  of  FMA  guarantees  that  the  algorithm  pro- 
duces a forward  context  of  length  at  least  one.  If  we  did  not  cause  FMA  to 
read  the  first  symbol,  then  it  would  consider  all  reductions  that  have  the 
first  symbol  in  their  look-ahead  sets;  possible  choices  between  a read  and 
some  reductions  might  have  caused  FMA  to  halt  immediately  in  case  "otherwise," 
making  no  progress  whatsoever.  (We  assume  also  for  the  remainder  of  this 
paper  that  we  never  invoke  FMA  on  the  input  consisting  only  of  J_,  otherwise 
we  would  immediately  read  in  step  Init.) 

In  Section  5 we  precompute  the  state  sets  of  FMA  as  states.  This  allows 
us  to  extend  the  concepts  of  transitions  and  paths  to  FMA's  state  sets.  Hence, 
if  FMA  consumes  text  u from  string  uv  and  produces  forward  context  U,  we  may 
write  FMA; (?,uv)  |—  ([?;U],v).  The  relation  |—  that  can  be  deduced  from  FMA 
is  exactly  the  same  as  that  of  the  LR  parsing  algorithm,  but  to  prevent  confu- 
sion laetween  the  LR  parsing  algorithm  and  FMA,  we  prefix  moves  of  FMA  by  "FMA;", 


as  2d30ve. 


r 
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3.  FORMAL  PROPERTIES  OF  FMA 

Suppose  FMA:{?,uv)  |—  (I?!U],V).  U satisfies  important  properties 
that  we  explore  in  this  section.  Essentially,  U is  such  that  during  a parse 
of  any  sentence  ending  in  uv,  u must  be  reduced  to  U.  We  formalize  euid  in- 
dicate the  significamce  of  this  property  in  this  section.  To  do  so,  we  de- 
fine some  new  terminology. 

A valid  prefix  is  euiy  prefix  of  yw,  where  S -»■*  yAv  -*■  ywv  for  some  y e V*,  : 

A -►  w e P,  and  v e T*.  The  string  spelled  by  the  stack  at  any  point  during  i 

LR  p2u:sing  is  a valid  prefix.  A valid  fragment  is  a suffix  of  a valid  pre-  i 

fix;  i.e.  valid  fragments  are  suffixes  of  the  strings  spelled  by  the  parser's 
stack.  For  example,  for  the  grammar  of  Figure  1,  E -►*  E+P**i  -*■  E+{E)**i  I; 

; i 

so  amy  prefix  of  E+(E)  is  a valid  prefix,  e.g.  E+(,  and  any  suffix  of  E+(  j 

is  a valid  fragment,  e.g.  +(.  We  now  define  the  concept  central  to  this 

j- 

paper.  I 

Definition  1.  U e V*  is  a derived  valid  fragment  (DVF)  of  sentence  ! 

suffix  X iff  ! 

I 

(1)  U -»•*  u and  X = uv  for  some  u,  v e T*,  and 

(2)  for  every  valid  prefix  y such  that 

(lyl.uv)  1-^^^  ...  1-  accept, 

( [y]  ,uv)  )-  ( [yU]  ,v) . 

!■ 

Thus,  during  a parse  of  euiy  sentence  ending  in  uv,  at  some  point  the  parser 

must  reduce  u to  the  valid  fragment  U.  (The  requirement  that  the  first  |—  r 

is  I—  , relates  to  the  fact  that  FMA  reads  as  its  first  move.)  i 

' read  i 

In  the  context  of  error  recovery,  this  concept  has  the  following  sig-  j 

nificance:  Suppose  the  parser  encounters  ein  error  and  halts  in  configuration 
(Z,uv)  with  uv  a suffix  of  a sentence,  and  that  an  error  repair  algorithm 
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suggests  [y']  as  a possible  replacement  for  Z.  We  could  verify  that 
([y'l»uv)  |—  accept  by  actually  trying  the  parse,  but  if  many  such  [y'ls 
were  to  be  tested,  reparsing  uv  each  time  would  be  costly.  The  significance 
of  having  some  DVF  U of  uv  is  that  in  U we  have  a "partially  parsed"  ver- 
sion of  u and  need  not  repeat  this  partial  parse,  for  the  DVF  property 
states  that  u must  be  reduced  to  U no  matter  what  the  string  to  the  left 
of  uv. 

A necessary  (not  sufficient)  condition  that  (Iy’],uv)  |—  accept  is 

I * 

as  follows:  Let  y be  such  that  {[y'],uv)  |—  ([y],uv)  and  PD(Top  [y], First 
uv)  » {read};  then  it  must  be  the  case  that  ((y],uv)  (—  ([yU],v).  This  is 
by  definition  of  a DVF  emd  due  to  the  fact  that  ((y']»uv)  j—  accept  only 
if  (ty],uv)  |—  accept.  For  any  given  [y'l,  then,  this  requires  only  that 
we  compute  (yl  and  determine  whether  a path  exists  from  Top  ly]  spelling  U. 
Determining  the  existence  of  the  path  [yll]  is  considerably  cheaper  than  re- 
parsing u if  u is  much  longer  than  U,  and  gives  us  an  inexpensive  test  to 
determine  if  the  proposed  stack  repair  [y'l  is  "good  enough"  to  cause  the 
parser  to  consume  u. 

For  future  convenience  we  present  the  algorithm  Consume_DVF  that  per- 
forms the  computation  just  described. 

algorithm  Consume_DVF 

input  [q:yl  and  U — a path  and  a forward  context. 

output  a boolean  value  — indicating 
whether  [q:y]  can  consume  U — 
and  giving  either  the  successfully 
computed  path  or  an  error  message. 

# First,  do  reductions  triggered  by  First  U. 

while  ((q;y],U)  |-  (Z,U) 

for  M e P (the  productions) 

emd  for  some  path  Z 

do  y +•  Spelling  Z od  # i.e.  reduce. 


(cont. ) 
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# After  all  possible  reductions  have  been  made,  the 

# the  next  parsing  decision  must  be  a read. 

let  MOVES  = PD(Top  [q:yl,  First  U) 

if  MOVES  ft  {read}  then  return  False  giving  [q:y]  fi 

# Now  we  must  be  able  to  find  a path 

if  path  lq:yU]  exists  then  return  True  giving  [q:yUl 

else  return  False  giving  "path  ended  in  error"  fi 

end  Consuine_DVF 

Note  that  the  effect  is  the  same  when  the  computation  of  [q;y]  is  based 
on  First  U rather  them  First  u,  namely,  if  path  [q;yUl  exists  both  methods 
produce  the  saune  result.  However,  if  [q:yU]  does  not  exist,  using  First  U 
rather  than  First  u may  cause  the  situation  to  be  detected  earlier:  MOVES 
might  not  be  equal  to  {read}.  We  get  this  "earlier  detection"  capability 
because  First  U may  be  a nonterminal  representing  not  only  First  u but  also 
some  text  to  its  right,  and  hence  First  u may  be  a member  of  look-ahead  sets 
of  which  First  U is  not  a member.  Parsers  not  having  nonterminals  in  look- 
ahead sets  must  retain  First  u for  any  DVF  U in  order  to  use  Consume_DVF. 
Note  further  that  Consume_DVF  tedces  any  path  as  its  first  argument,  rather 
than  just  paths  beginning  at  START;  this  is  because  we  eventually  intend  to 
use  it  with  paths  produced  by  FMA  also. 

Our  main  result  is:  For  some  sentence  suffix  uv,  if  FMA:(?,uv)  |— 
((7:0] ,v),  then  U is  a DVF  of  uv.  This  results,  intuitively,  from  the  fact 
that  FMA  parses  uv  with  no  assumptions  about  left  context.  Thus  u must  be 
reduced  to  U no  matter  what  the  left  context  of  u. 

But  first  we  consider  a generalization  of  the  DVF  concept,  and  prove 
our  results  in  terms  of  it. 

Definition  2.  U e V*  is  a restricted  DVF  (RDVF)  of  sentence  suffix  x 
with  respect  to  RQ  S K iff 

(1)  U ■+•*  u,  X * uv  for  some  u,  v e T*,  and 
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(2)  for  every  valid  prefix  y accessing  some  q e RQ 

such  that  ([y],uv)  1— 1—  accept, 

( ly) »uv)  |-  ( [yU] ,v) . 

Note  that  a DVF  is  an  RDVF  with  respect  to  ? = K. 

To  adapt  FMA  to  the  RDVF  concept,  we  allow  it  to  begin  parsing  with 
any  RQ  ^ K,  not  just  ? = K.  Thus  we  may  write  FMA: (RQ,uv)  (—  ([RQ:U],v). 

This  requires  only  that  the  Init  step  of  FMA  be  altered  to  use  RQ  instead 
of  ?,  We  shall  prove  that  if  FMA: (RQ,uv)  |—  ([RQ:U],v),  then  U is  an  RDVF 
of  uv  with  respect  to  RQ.  Letting  RQ  = ? gives  us  our  main  result  as  a 
corollary  to  Theorem  1. 

The  reason  for  the  RDVF  concept  is  that  sometimes  we  will  want  to  apply 
FMA  to  X in  a situation  in  which  we  ^ know  something  about  the  context  to 
the  left  of  X.  In  general,  starting  FMA  with  some  restricted  context  RQ 
allows  it  to  both  get  farther  in  the  input  text  (there  will  be  fewer  inade- 
quacies) , cind  have  better  error  detection  capabilities.  In  essence,  RQ  de- 
fines the  possible  contexts  in  which  to  parse  x;  if  RQ  = ? then  we  essentially 
put  no  restriction  on  the  left  contexts. 

First,  we  prove  two  lemmas  concerning  FMA.  These  demonstrate  how  FMA 
keeps  track  in  parallel  of  paths  that  would  be  computed  by  the  parser. 

I * 

Lemma  1.  Let  FMA: (RQ,uv)  |—  (tRQ:U),v)  = (RQ  Qj  ...  Qj^,  v) . For  any 
path  tyU]  such  that  y accesses  some  p e RQ,  [p:U]  = p qj  ...  and  qj^  e Qi, 

1 < i < m. 

Proof.  By  induction  on  m.  Let  U = ai  ...  For  m = 1:  p e RQ, 

hence  by  step  Init  (for  aj  e T)  or  case  {A  w)  (for  aj  e N)  of  FMA,  qi  = 

SIGMA(p,ai),  qi  e Qi,  and  Qj  = {q'  | q — ^-1— > q'  and  q e RQ}.  Now  assume  true 

for  m ■ k;  thus  [RQ;ai . . .aj^l  = RQ  Qi  ...  Qj^,  Ip:ai...ajj]  * p qi  ...  qj^  and 
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e 1 5 i i k.  By  case  {read}  or  case  {A  -*■  w}  of  FMA,  = 

SIGMA (q^, aj5+ j) , q,j+j  e Qx+1 ' ^nd  = {q'  | q — q'  and  q e Qj^}. 

Lemma  2.  Suppose  FMA: (RQ,uv)  |—  . . . | „ ((RQ:Ul,v)  = (RQ  Qi  ...  Q„,v) . 

■'  M j 

If,  for  some  y ^u^d  y',  {[y],uv)  |-  . . . !„  ([y'l.v)  and  Top  [y]  e RQ,  then 

Ml  My 

[y'l  * [yl  qj  •••  qm,  where  q^  e Qi,  1 i i i m. 

Proof.  Given  that  the  parser  and  FMA  make  the  same  moves  Mj  ...  My, 
they  stack  the  same  symbols.  Thus  y'  = yU,  and  the  result  follows  from 
Lemma  1. 

Theorem  1.  For  some  sentence  suffix  uv,  if  FMA: (RQ,uv)  I—  ...  |— 

Mj  My 

([RQ:U],v),  then  U is  an  RDVF  of  uv  with  respect  to  RQ. 

Proof.  Let  y be  a valid  prefix  such  that  y accesses  some  q e RQ, 

([vl,uv)  |— , ...  It:,  accept  and  m;  = read.  By  induction  we  show  property 
MJ  M^ , — i 

P(r)  to  hold,  where  P(k)  is  defined  for  k < r as 

r’  > k and  M!  = M.  for  1 < i < k. 

1 1 

If  P(r)  holds,  then  (ly],uv)  (-  • . • 1];^  ( [yU]  ,v)  , the  desired  conclusion. 

For  r = 1:  Mj  = read  = MJ  by  step  Init  of  FMA.  Let  P(k)  hold.  By 
t'jmma  2,  FMA's  stack  after  move  is 


RQ  Qi  Q2  • • • 2] 


m 


= [RQ-.U'l 


and  the  parser's  stack  after  move  M^^  = M^^  is 


[yl  qi  q2  ••• 

for  some  m,  where  q^^  c Q^,  1 £ i S m.  Let  the  next  input  symbol  be  h (h  is 
in  u or  is  First  v) . We  prove  by  contradiction  P(k+1) , i.e.  that  in  addi- 
tion, r'  i k+1  and  ”k+i  = ”’k+r 

(1)  Assume  r'  < k+1;  by  the  induction  hypothesis,  k ^ r',  so  this 

forces  k = r' , i.e.  the  parser's  last  move  was  My,.  Then  ( [yU' 1 ,|^) 
accept , so  that  m = 1,  [y]  = {],  and  U'  = S'  (recall  production 


S -*■  S'J[).  Since  there  is  a unique  state  in  the  parser  having 
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accessing  symbol  S',  we  have  {q^}  = Qj  and  PD(qj,J^)  = {accept}, 


so  that  for  FMA,  MOVES  = {accept}.  Thus  FMA  cannot  make  move 
Hence  by  contradiction,  r'  i k+1. 

(2)  Assume  r'  1 k+1,  but  ^+1‘  ^^k+1^  “ PD(q^,h)  . 

But  since  q^^^  e PD(q,h)  would  contain  both  and 

By  case  "otherwise"  of  FMA,  FMA  would  not  make  move 
Mk+i-  H®"ce  M^^j  = M^^j. 

Hence  P(k+1)  holds  when  P(k)  holds,  hence  P(r)  and  our  conclusion. 

Corollary.  If  we  let  RQ  = ?,  we  have  immediately  that  U is  a DVF  of  uv. 
We  claimed  in  section  2 that  FMA  parses  as  much  as  it  can  until  it  must 
refer  to  nonexistent  left  context.  We  formalize  this  intuition  below. 

Definition  3.  U e V*  is  the  maximal  RDVF  (MRDVF)  of  sentence  suffix 
X with  respect  to  RQ  ^ K iff  the  following  three  conditions  imply  that 
(tyU'l,v')  1-  ([yUl,v): 

(1)  U is  an  RDVF  of  x with  respect  to  RQ  where 
U +*  u and  X = uv  for  some  u,  v e T*, 

(2)  U'  is  any  other  RDVF  of  x with  respect  to  RQ  where 
U'  +*  u'  and  X = u'v'  for  some  u,  v e T*, 

(3)  there  exists  valid  prefix  y such  that 


y accesses  some  q e RQ  and 

{Iy],uv)|-^^^  ...  I-  accept. 

Thus,  by  the  definition  of  DVF's,  ([y],uv)  |—  ([yU’],v')  |—  ({yUl,v), 
so  that  an  MRDVF  U is  "as  far  up"  the  derivation  tree  of  yuv  as  possible. 

V must  be  a suffix  of  v' , so  that  |u|  1 In'],  i.e.  U derives  the  longest 

possible  prefix  of  x.  If  v = v*  then  we  see  that  U is  reduced  as  much  as 

possible,  since  then  U +*  U' . An  algorithm  that  produces  the  MRDVF  would 

therefore  read  as  far  as  it  could  into  the  input,  and  reduce  as  much  as  it 
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could.  It  is  clear  from  the  definition  that  the  MRDVF  is  unique. 

FMA,  when  started  in  state  RQ,  does  not  always  compute  the  MRDVF.  This 
is  because  FMA  is  restricted  to  using  the  same  parsing  technique  as  the  par- 
ser amd  therefore  to  the  same  finite  look-ahead.  An  algorithm  superior  to 
FMA  might  scan  all  of  x,  perhaps  discovering  some  contextually  significant 
symbol  located  towards  the  end  of  x that  could  help  it  parse  earlier  text. 

An  FMA  based  on  an  SLR  machine  might  be  bested  by  one  based  on  an  LALR  ma- 
chine for  the  same  grammar.  But  given  the  limitation  of  the  base  parser, 

FMA  does  as  best  it  can.  These  restrictions  are  encoded  in  the  following 
theorem  that  formalizes  FMA's  performance  in  terms  of  its  base  parser. 

Theorem  2.  Consider  suffix  uv  of  a sentence.  If  there  exist  RQ  ^ K, 
integer  r i 1 and  a sequence  of  moves  Mj^  ...  M^  such  that 

(i)  Mj  = read, 

(ii)  there  exists  some  state  q e RQ  and  U e V*  such  that 

(q,uv)  1-  ...  |tt  ([q:U],v),  and 

Ml  Mj. 

(iii)  there  exists  no  valid  prefix  y,  integer  k < r, 
and  configurations  C and  C such  that 

y accesses  some  q'  e RQ, 

([y],uv)  1-  ...  1-^  c 1-.^^  c*, 

and  Mj^^i  M,^^., 

then  FMA: (RQ,uv)  |-  . . . b ([RQ:U],v). 

”l  ”r 

Proof.  By  induction.  We  prove  P(r) , where  P(k)  is  defined  for 
k < r as 

FMA:(RQ,uv)  |-  ... 

for  some  configuration  C'.  P(l)  holds  since  Mj  = read  and  FMA  reads  as  its 
first  move,  by  step  Init  of  FMA.  Let  P(k)  hold;  we  show  that  P(k+1)  holds. 
Let  q be  as  in  hypothesis  (ii) . If  P(k) , then  by  Lemma  2 we  have 
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FMA;(RQ,uv)  [-  ...  (RQ  Qj  ...  Q^,  R)  (=C") 


(q,  UV)  I-  ...  1-^  (q  qj  ...  q^,  R) 
for  some  m and  R,  where  q^^  e Q^,  1 < i ^ m.  The  parser's  next  move  is 
from  (q  ...  show  by  contradiction  that  FMA  makes  move 

from  (RQ  Qj  ...  QjjjjR).  Assume  instead  that  FMA  does  not  make  move 
Consider  the  possible  ways  in  which  this  can  happen.  Let  MOVES  = 


PD(q, First  R)  : 

I t Qm 


q^Qm  

(1)  MOVES  = } = {a  -*•  w},  but  FMA  cannot  make  move  A ->  w because 

m < |w|  . Then  neither  can  the  parser  make  move  A ->■  w by  the  de- 

finition of  |— ; m < |w|  implies  that  there  exists  no  y'  such 
that  q qj  ...  q^^,  = [q:y'w]  . 

(2)  MOVES  = and  f But  since  q^^^  e Mj^^^  e MOVES, 

so  that  MOVES 

(3)  MOVES  = {}.  But  again  since  qj^  e Q^,  e MOVES. 

(4)  |mOVES|  > 1.  Then  let  E MOVES.  Let  q qj  ...  q„  = 

[q:y'].  There  must  exist  some  q'  e RQ,  q'  ^ q,  such  that  = 

PD(Top[q' :y] ,First  R) . Let  y access  q' . Then 

([y],uv)  |-  ...  I-  ([yy'],R)  |-,  C 

1 "k  k+i 

for  some  C . But  this  contradicts  hypothesis  (iii)  . 

We  have  shown  all  possibilities  contradictory;  hence  P(k+1)  and  thus  P(r) : 
FMA:(RQ,uv)  |-  ...  |-  C". 

Ml  Mj. 

for  some  C'.  But  since  (q,uv)  |—  ...  |—  ([q:U],v),  C'  = ([BQ:U],v). 

Ml  Mj, 

Corollary.  FMA  applied  to  a sentence  suffix  makes  the  greatest  number  r 
of  moves  possible,  where  r is  as  defined  in  Theorem  2. 


Proof.  Merely  let  the  q of  hypothesis  (ii)  be  such  that  r is  maxi- 


mized. 


Apart  from  the  significcmce  of  DVF'n  in  validating  error  repairs,  DVF's 
satisfy  other  useful  properties.  The  "next  move"  property  is  helpful  in 
selecting  error  repairs.  Let  uv  be  a sentence  suffix,  and  U be  the  DVF  of 
uv  returned  by  FMA  (when  started  with  ?) . When  FMA  halts,  the  value  of 
MOVES  is  such  that 

MOVES  = * {m  I ( tyU] ,v)  I—  C for  some  C} 

y G V * M 

[yU]  a valid  prefix 

In  other  words,  MOVES  contains  all  the  moves  that  the  parser  may  make  from 
some  configuration  ( [yU] ,v) . Intuitively,  since  FMA  parses  without  knowing 
y,  at  each  step  MOVES  represents  the  set  of  moves  for  all  possible  y's. 

The  utility  of  the  next  move  property  is  illustrated  as  follows.  For 
the  Algol  example  of  the  previous  section,  MOVES  would  contain  the  two  moves 
indicating  "reduce  by  production  (1)"  and  "reduce  by  production  (2)",  where 
U = "do  Stmt".  The  next  move  property  says  that  if  we  find  some  y such  that 
the  parser  makes  move  M from  configuration  { [yU] ,v) , then  M must  be  one  of 
those  two  reductions.  Either  reduction  puts  a constraint  on  y;  it  must  end 
in  either  "for  Id  :=  Exp  step  Exp  until  Exp"  or  "while  Exp".  We  may  thus 
sometimes  use  the  elements  of  MOVES  to  guide  us  in  the  selection  of  y's. 

We  call  these  reductions  "long  reductions"  because  if  performed  during  the 
forward  move,  they  would  attempt  to  pop  the  ? state  set  (in  the  Algol  example 
FMA  halted  in  case  "otherwise") . Such  long  reductions  can  sometimes  provide 
"instant  solutions"  to  some  errors.  In  this  example,  a comparison  of  the 
stack  with  the  set  MOVES  shows  that  we  should  patch  up  the  stack  by  insert- 
ing the  missing  Exp  and  continue.  In  practice,  we  may  simply  search  the 
stack  preceding  the  point  of  error  detection  for  some  state  that  can  read 
the  left  part  of  either  production  (we  call  this  technique  "stack  forcing") . 
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We  mention  this  again  in  the  next  section.  MOVES  may  contain  elements 
that  are  not  long  reductions,  such  as  "read"  or  a "short  reduction,"  but 
we  do  not  yet  know  how  best  to  make  use  of  this  information.  We  formalize 
the  next  move  property  as  follows: 

Theorem  3.  Let  x be  a sentence  suffix,  RQ  c K,  U an  RDVF  of  x with 
respect  to  RQ  with  u -►*  u and  x = uv,  and 


MOVES  = 


U. 


q Z Top  [RQ:U] 
Then  we  have  the  following; 


PD(q,  First  v) . 


Let  X = {m  I ([yU],v)  I—  C for  some  c). 

y c V M 

[yU]  a valid  prefix 
Top  [y]  e RQ 

Then  MOVES  = X. 

Proof . Let  M e MOVES.  By  the  definition  of  MOVES,  there  exists  some 
q e TOP  tRQ:U]  such  that  M e PD (q, First  v) , and  hence  some  q'  e RQ  such 
that  Top  [q' :U]  = q.  Let  y access  q' ; then  M e PD (Top [yU] , First  v)  so  that 

there  exists  C such  that  ([yU],v)  I-  c.  Hence  MOVES  «=  X. 

M 

Consider  now  M e X;  corresponding  to  M there  is  a valid  prefix  yU  such 
that  M E PD(Top  [yU], First  v)  and  Top  [y]  e RQ.  But  Top  [yU]  e Top  [RQ:U] 
by  Lemma  1,  so  that  by  the  definition  of  MOVES,  M c MOVES.  Hence  X E MOVES, 
emd  our  conclusion. 

Corollary.  If  RQ  = ?,  then  | MOVES | > 1. 

Proof.  Since  x is  a sentence  suffix,  there  exists  some  valid  prefix  y 
such  that  S -*■  yx;  thus  ((yl,x)  |—  accept,  without  loss  of  generality  let 
read  e PD(Top  [y] ,x) . Then  by  the  RDVF  definition,  ([yl,x)  |—  ( [yU] ,v) . 

Let  C be  such  that  ( [yUl ,v)  [—  C for  some  M (there  must  be  at  least  one 
since  ([y],x)  [-  accept) ; M e MOVES.  Hence  1m0VEs|  > 1.  (For  a similar 


result  where  RQ  * ? see  (DSR  77].) 
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We  can  further  use  the  next  move  property  to  help  pinpoint  errors. 

If  FMA(?,uv)  ]—  {[?;U],v)  but  halts  in  case  {},  i.e.  MOVES  = {},  then  uv 
is  not  a sentence  suffix.  That  is,  an  error  has  occurred  somewhere  in  the 
text  uv,  because  there  exists  no  y such  that  S -*■*  yuv.  More  specifically, 
since  we  are  dealing  with  LR(1)  parsers,  the  error  has  occurred  in  the 
"window"  comprised  of  the  first  |u|+l  symbols  of  uv. 

In  summary,  we  have  shown  that  FMA  (1)  provides  an  inexpensive  test 
for  stack  replacements,  (2)  sometimes  points  us  directly  to  the  repair  we 
need  to  continue  the  parse,  and  (3)  sometimes  finds  a "window"  within  which 
an  error  has  occurred.  We  do  not  know  how,  in  the  general  case,  to  come  up 
with  stack  replacements.  In  a more  specialized  case  in  which  we  assume 
some  knowledge  of  the  types  of  errors,  we  have  a chance  of  designing  stack 
replacements. 
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4.  REPAIR  STRATEGIES  USING  FMA 

Given  FMA  and  its  formal  properties,  we  now  proceed  to  develop  an 
algorithm  that  finds  a useedsle  configuration  in  which  to  restart  the  parser. 
In  our  initial  analysis  we  make  the  "simple  error  assumption"  (SEA),  viz. 
the  non-sentence  z in  question  resulted  from  a sentence  via  a single  "mutil- 
ation" : an  insertion,  a replacement,  or  a deletion  of  a single  terminal 
symbol. 


Insertion : 

z 

= ytx 

and 

s 

yx 

but 

not 

S 

ytx 

Replacement : 

z 

= ytx 

and 

s 

yt'x 

but 

not 

s -*•* 

ytx 

Deletion: 

z 

= yx 

and 

s ->* 

ytx 

but 

not 

s -y* 

yx 

In  the  next  few  paragraphs  we  assume  ein  LR(k) , as  opposed  to  SLR(k)  or 
LALR(k) , parser  eind  we  even  assume  that  the  parser  detects  the  error  at 
the  point  of  mutilation.  Then  we  generalize  gradually  and  discuss  the  con- 
sequences. 

Suppose  the  parser  detects  an  error  in  configuration  (Z,tx).  Thus,  t 
is  an  unexpected  symbol  in  the  left  context  spelled  by  Z.  Suppose  further 
that  we  have  reason  to  believe  that  an  insertion  of  t occurred.  How  could 
we  confirm  that  suspicion?  A straight-forward  way  is  simply  to  determine 

I * 

if  (Z,x)  j—  accept;  i.e.  delete  t and  resume  parsing.  Similarly,  if  we 
thought  the  mutilation  was  the  replacement  of  some  terminal  t'  by  t,  we  must 
resume  with  (Z,t’x),  and  if  the  deletion  of  some  t*  just  prior  to  t,  then 
(Z,ftx)  . 

Now,  in  the  error  recovery  context  we  have  no  clue  as  to  which  of  the 
above  repairs  may  work,  so  we  must  try  them  all.  Furthermore,  if  none  of 
them  work,  we  can  conclude  for  an  LR(k)  parser  and  under  SEA  that  the  mutila- 
tion occurred  left  of  the  point  of  error  detection,  i.o.  the  parser  somehow 
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incorporated  the  mutilation  on  its  stack.  In  the  case  of  an  SLR{k)  or 
LALR(k)  parser,  even  if  the  correct  "unmutilation"  is  found,  the  above  trials 
may  not  work  since  the  parser  may  have  made  reductions  (by  looking  ahead  at 
or  ignoring  the  unexpected  symbol)  that  the  corresponding  LR(k)  parser 
would  not  have  made.  Repairs  in  these  cases  will  involve  some  form  of  back- 
ing up  the  parser.  But  before  considering  those  implications,  let  us  con- 
sider the  use  of  FMA  to  reduce  the  cost  of  trial  parses. 

To  limit  the  repeated  parsing  of  x we  apply  FMA  to  x recursively  until 
it  has  been  reduced  to  a sequence  of  DVF's  Uj , ...,  U^.  We  call  this  pro- 
cess FMA+,  which  can  be  defined  as  follows: 

FMA+(x)  = 

if  X = J_  then  else 

U such  that  FMA:(?,x)  |-  ((?:U],v) 

followed  by  FMA+(v) 

Furthermore,  before  trying  the  insertions,  not  having  found  a deletion  or 
replacement  that  will  work,  we  should  apply  FMA+  to  tx,  thus  producing  some 
Uj , ...,  for  some  m < n+1.  Let  u^  be  the  terminal  string  that  was  reduced 
to  by  applying  FMA+  to  x.  We  have  m = n+l  and  for  1 s i i n if 

the  first  application  of  FMA  to  tx  parses  t and  then  halts;  m = n and  U|  = 
for  1 5 i i n if  FMA  parses  t and  all  of  Uj  but  then  halts;  m = n-1  and 
j for  2 ^ i ^ n if  t,  u^,  and  u^  are  similarly  combined  by  FMA.  Tak- 

ing advantage  of  the  fact  that  Uj , ...,  u^^  have  been  reduced  to  Uj,  ..., 
by  the  previous  application  of  FMA+  to  x,  we  can  avoid  applying  FMA+  to  tx 
by  instead  using  the  following  algorithm  to  "attach"  t to  the  "extended  for- 
ward context"  Uj  ...  producing  the  same  result. 

algorithm  Attach 

input  h,  C — the  symbol  to  be  attached  and 
the  sequence  of  DVFs  to  attach  it  to. 


(cont . ) 
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output  a Boolean  value,  and  giving 
the  resulting  sequence  of  DVFs. 
let  P be  a path  such  that  Consuine_DVF(?,h)  gives  P 
while  C is  not  null  do 

let  P'  be  a path  variable 
if  Consuine_DVF(P,  First  C)  gives  P' 
then  P P' ; C ■<-  Rest  C 
elseif  P'  = "path  ends  in  error" 
then  return  False 

# not  giving  anything;  irrelevant. 
else  return  True 

giving  Augment (Spelling  P,  C) 
fi 
od 

return  True  giving  Spelling  P 
end  Attach 

In  the  above  we  have  assumed  the  operation , Augment , on  sequences  that  pro- 
vides a sequence  of  length  n+1  by  adding  a new  element,  the  left  operauid, 
to  the  front  (left)  of  a sequence  of  length  n,  tht  right  operand. 

Non- immediate  detection.  Suppose  that  none  of  the  deletions,  replace- 
ments, or  insertions  succeed.  An  easy  way  to  proceed  next  is  to  start 
backing  down  the  stack,  one  symbol  at  a time,  trying  deletions  and  replace- 
ments of  each  symbol  h,  then  if  none  of  these  succeed,  attaching  h to  the 
previous  extended  forward  context  and  trying  insertions  in  front  of  h,  just 
as  we  did  for  the  unexpected  symbol,  which  has  now  been  "exonerated"  (for 
LR(k)  parsers,  at  least).  We  summarize  this  entire  strategy  as  follows. 
algorithm  Error_recovery 

input  (Z,R) — the  erroneous  configuration. 
output  the  repaired  configuration. 
let  h and  h'  = First  R,  Z'  = Z, 

EFC  and  EFC  = FMA+(Rest  R) 
while  Z*  is  not  empty  ^ 

# Try  deletion,  replacements, 

# attachment,  then  insertions. 
let  C be  a configuration  variable 
if  Try  (Z',  {A},  EFC)  gives  C then  return  C ^ 
if  Try  (Z* , T,  EFC)  gives  C then  return  C ^ 
if  not  Attach  (h,EFC)  gives  EFC  then 

exit  fi  # implies  a "window" 
if  Try  (Z* , T,  EFC)  gives  C then  return  C ^ 
h *-  Accessing  symbol  (Top  Z')  j Pop  Z* 


(cont. 
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od 

if  Try  stack  forcing  (Z,V,h' ,EFC' ) gives  C 
then  return  C fi 

return  (S'  ,J_)  # i.e.  give  up  by 

# returning  accept. 

end  Error_recovery 

Note  that  we  return  from  Error_recovery,  when  we  Try  a repair  that 
succeeds,  with  the  repaired  configuration  C.  However,  if  an  error  is  de- 
tected by  Attach,  we  exit  from  the  while  loop,  having  isolated  the  mutila- 
tion to  within  a "window"  comprising  the  text  from  the  leftmost  token  of 
the  phrase  associated  with  the  symbol  h up  to  the  original  unexpected  sym- 
bol, inclusive.  (A  message  should  be  printed  to  this  effect.)  Perhaps  in 
this  case  we  should  just  delete  all  the  text  in  the  window  and  call  Error_ 
recovery  recursively.  Instead  we  have  indicated  above  to  Try_stack_forcing 
as  suggested  in  Section  3.  At  this  point  further  investigation  and  develop- 
ment of  the  overall  algorithm  is  needed. 

Backing  down  the  stack  is,  of  course,  an  attenj>t  at  repairing  damage 
caused  by  the  mutilation.  In  some  cases  the  mutilation  will  not  have  affect- 
ed the  phrases  around  it.  A replaced' symbol , for  example,  may  still  be  on 
the  stack,  unreduced  or  simply  reduced.  Then  a deletion,  replacement,  or 
insertion  of  a single  symbol  may  precisely  undo  the  mutilation.  In  fact, 
to  increase  the  likelihood  of  a useful  repair  we  have  found  it  worthwhile  to 
Try  nonterminals  as  well  as  terminals,  i.e.  TRY (Z ' , V,EFC) , as  replacements 
and  insertions.  This  pays  off  when,  for  example,  a mutilation  has  affected 
only  one  phrase. 

On  the  other  hand,  a mutilation  only  belatedly  detected  can  have  caused 
2m  arbitrarily  large  eunount  of  "damage"  to  occur  on  the  stack,  in  the  sense 
that  many  reductions  may  have  occurred  that  would  not  have  on  the  unmutilated 
string.  For  exeunple,  inserting  a semicolon  before  an  operator  in  the  right 
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part  of  an  assignment  statement,  e.g.  '*X  = X + Y;  * Z;",  typically  results 
in  the  text  to  the  left  of  the  semicolon  being  reduced  to  a statement;  but 
then  we  are  left  with  an  expression  fragment  to  the  right  of  the  semicolon. 
Ideally,  in  such  cases  we  would  like  to  partially  "unparse"  some  symbol (s) 
on  the  stack,  then  Try  the  repairs.  Another  example  is  the  PL/r  conditional 
statement,  which  when  the  ^ is  deleted,  may  look  like  an  assignment  state- 
ment up  to  the  then:  " . . . ; X = Y + Z then  . . . else  ...  ; " . Here  the  un- 
wemted  reduction  of  X = Y + Z to  statement  might  occur  with  a LALR(l)  or 
SLR(l)  parser  but  not  with  an  LR(1)  parser. 

We  are  still  investigating  possible  approaches  to  recovering  from  such 
potentially  massive  damage,  approaches  about  which  something  formal  can  be 
said;  and  we  are  looking  for  grammatical  restrictions  that  might  limit  such 
damage.  However,  since  that  research  is  incomplete,  we  refrain  from  dis- 
cussing the  ideas  here,  other  than  to  note  that  "stack  forcing"  mentioned  in 
Section  3 ^d)Ove  appears  to  have  good  development  potential.  Ultimately, 
any  scheme  used  must  have  a significantly  greater  potential  for  facilitating 
"upper  level"  parsing  after  making  a repair  (FMA  will  have  done  the  "lower 
level"  parsing)  than  it  has  potential  for  causing  an  avalanche  of  spurious 
error  messages,  e.g.  if  the  repair  discards  several  left  bracket  symbols. 

Finally,  note  that  we  may  give  up  by  telling  the  parser  it  is  done,  i.e. 
by  returning  to  it  the  accepting  configuration.  This  is  not  unreasonable 
since  we  have  already  partially  parsed  the  input  beyond  the  point  of  error 
detection  and  we  are  only  giving  up  the  opportunity  to  parse  the  remaining 
upper  level,  thus  losing  the  opportunity  to  detect  some  other  errors.  Now  in 
practice  we  do  not  parse  all  of  the  remaining  input,  but  rather  we  stop  FMA+ 
at  a convenient  point  after  it  has  produced  at  least  seven,  say,  symbols  of 
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extended  forward  context.  We  consider  a trial  successful,  then,  if  after 
the  repair,  all  of  the  extended  forward  context  can  be  parsed;  then  we  re- 
turn to  the  parser  the  resulting  stack  and  the  remaining  input.  For  expo- 
sitional  purposes,  however,  we  continue  the  presentation  in  terms  of  the 
simpler,  if  impractical,  approach  of  partial  parsing  to  the  end  of  the 
program.  The  practical  modifications  are  not  difficult  to  make,  but  they 
would  obscure  the  presentation. 
algorithm  Try 

input  Z,  V,  EFC  — a state  stack, 

vocabulary,  and  a sequence  of  DVF's. 
output  a Boolean  value,  giving  a configuration. 
for  s e V do 
let  Z'  = Z 

let  EFC  = s = A then  EFC 

else  Augment (s, EFC)  fi 
while  EFC  is  not  empty  and 

Consume_DVF(Z' , First  EFC)  gives  Z' 
do  EFC  ■*-  Rest  EFC  od 
if  EFC  is  empty  then 

return  True  giving  (Z',J^)  fi 
od 

return  False  # giving  nothing;  irrelevant. 
end  Try 

Pragmatics.  The  algorithms  presented  above  are  idealistic  in  several 
ways.  We  have  already  mentioned  limiting  FMA+  rather  than  allowing  it  to 
proceed  to  the  end  of  the  program  being  parsed.  Now  we  consider  the  possi- 
bility of  several  mutilations  to  the  program.  In  this  case  FMA+  may  end  in 
error  before  accumulating  the  desired  number  of  symbols;  thus  we  simply 
accept  the  first  repair  that  successfully  reaches  the  subsequent  error  de- 
tection point. 

However,  it  may  happen  that  FMA+  will  not  detect  a subsequent  error, 
due  to  its  lack  of  "upper  level"  parsing.  Any  such  error  will  have  to  be 
detected  after  a repair  is  made.  For  example,  suppose  "...;  X < Y + Z then 
...;"  was  created  by  deleting  an  ^ and  inserting  a then , and 
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then  . . . else 
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suppose  one  releveuit  production  is;  If_clause  -*■  ^ Bexp  then.  An  error 
is  detected  at  the  "Y"  is  skipped,  and  FMA+  is  invoked.  After  the 

first  then  the  parser  may  be  ready  to  reduce  without  needing  to  look  ahead; 
but  no  reduction  c^m  be  made  due  to  incomplete  parsing  to  the  left.  So  FMA+ 
calls  FMA  again,  starting  at  the  second  then,  and  the  inserted  then  is  not 
reported  as  an  error  by  FMA+.  Again  the  solution  of  choice  is  to  take  the 
repair  that  results  in  getting  the  farthest  into  the  extended  forward  con- 
text before  detecting  a subsequent  error. 

Semantics.  Given  this  error  recovery  scheme  it  is  unlikely  to  be 
worth  trying  to  continue  to  drive  "semantic  routines"  that  perform  static 
semantic  euialysis  and/or  code  generation,  even  after  the  first  error  is 
encountered,  since  peirsing  proceeds  in  a non-canonical  order  while  recover- 
ing. On  the  other  hand,  in  a compiler  whose  parser  builds  an  abstract- 
syntax  tree,  which  is  to  be  traversed  subsequently  to  parsing  for  further 
analysis  and  code  generation,  we  may  continue  tree-building  during  FMA+; 
and  simple  repairs  will  lead  to  knitting  the  subtrees  together  appropriately 
for  subsequent  processing.  Of  course,  gross  repairs  will  result  in  a meingled 
tree,  but  that  in  turn  just  presents  an  error  detection  and  recovery  prob- 
lem to  the  subsequent  processor.  Presumably,  if  a formal  technique  such 
as  an  "affix  grammar"  [Kos  71]  is  adapted  to  describe  the  static  semantics 
of  Icuiguages  based  on  abstract-syntax  trees,  then  automatic  techniques  can 
be  used  to  perform  the  analysis  (see,  e.g.,  (Wat  75]  and  (DeR  77]),  and  thus 
our  recovery  algorithm  may  prove  useful  there,  too. 

Improving  FMA+.  Having  nonterminals  in  look-ahead  sets  allows  us  to 
construct  an  improved  version  of  FMA+.  When  FMA  is  applied  to  a sentence 
suffix,  it  may  halt  by  encountering  an  inadequacy  (case  "otherwise");  i.e. 


the  next  (terminal)  symbol  is  insufficient  to  resolve  the  parsing  conflict. 
However,  FMA+  immediately  applies  FMA  to  that  symbol  and  what  follows,  re- 
sulting in  a DVF  that  may  begin  with  a nonterminal  that  sufficient  to 
resolve  the  conflict.  This  is  because  the  nonterminal  may  represent  an 
eurbitrarily  long  look-ahead,  i.e.  the  phrase  that  was  reduced  to  it  and 
perhaps  one  symbol  beyond  (due  to  the  usual  look-ahead) . It  behooves  us 
then  to  review  recursively  the  decision  at  the  end  of  the  prior  DVF  each 
time  a new  one  is  computed  that  begins  with  a nonterminal.  This  approaches 
the  non-canonical  parsing  of  the  LR(k,t)  style  as  suggested  by  Knuth  [Knu  65). 

algorithm  Super_FMA+ 

input  X — a sentence  suffix. 

output  a sequence  of  DVFs  derived  from  x. 

if  X = then  return  J_ 

else 

let  U,  V be  such  that 
FMA: (?,x)  |-  ([?:U] ,v) 

let  S be  an  empty  stack  of  paths 
Push  [?:U]  on  S 
while  V J_  do 

let  U',  v'  be  such  that 
FMA: (?,V)  1-  ([?:U'] ,v') 

V v* 

if  First  U'  ^ N (the  nonterminals)  then 
while  S is  not  empty  do 

let  Z be  a path  varicible 
if  Consuine_DVF (Top  S,  U')  gives  Z 
then  U'  •<-  Spelling  Z;  Pop  S 
else  Push  [?:U']  on  S;  exit 
# exit  from  inner  loop 
fi 
od 
fi 
od 

return  the  sequence  of  DVFs  spelled  by 
the  paths  on  S,  followed  by  J[ 
fi 

end  Super_FMA+ 

Even  this  algoritlim  can  be  improved.  Each  time  we  "restart"  FMA,  at 
the  Iseginning  of  the  outermost  while  loop,  we  begin  with  "?",  representing 


no  knowledge  of  left  context  whatsoever. 


But  we  do  know  something  al>out  the 


left  context  in  this  case,  viz.  the  possibilities  are  restricted  to  those 
implied  by  the  top  state  in  the  top  path  on  S.  Assume  PMA  has  halted  with 
Q on  the  top  of  its  stack.  Then  instead  of  restarting  with  state  set  ?, 
we  restart  with  state  set  q^>^Q  ^ < where 

RS(q)  = {q*  I S ->■*  y'x  ■**  yx, 

y,  y'  e V*,  x c T* , 
y accesses  q,  y’  accesses  q'} 

The  states  from  which  First  x can  be  read,  after  y is  (optionally)  reduced 
to  some  y* , are  in  RS(q).  The  idea  of  using  a restricted  restart  state  has 
been  suggested  by  Tai  [Tai  77]  for  use  in  non-canonical  SLR(l)  parsing,  al- 
though his  restart  states  are  different  from  ours  and  do  not  apply  to  LR(k) 
parsers  in  general. 

We  have  implemented  such  restricted  restart  states  and  have  observed 
that,  while  they  do  in  fact  improve  error  recovery,  their  expense  in  terms 
of  the  size  of  che  parser  + error  recovery  machine  may  be  too  great.  For 
the  statistics  see  the  end  of  Chapter  5. 
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5.  MAKING  FMA  PRACTICAL 


In  this  section  we  re-describe  FMA  as  an  algorithm  that  manipulates 
not  state  sets  but  pre-computed  states,  thereby  making  it  practical. 

FMA  computes  state  sets  dynamically  by  referring  to  the  parser's 
states  — see  cases  {read}  and  {A  ->•  w}  of  FMA.  There  is  no  reason  why  we 
cannot  precompute  these  state  sets  and  the  transitions  between  them;  this 
gives  rise  to  a separate  set  of  states  for  FMA. 

We  compute  these  states  as  follows.  Let  K be  the  set  of  parser  states. 
The  set  K'  of  FMA  states  is  computed  by  beginning  with  K'  = {?}.  Repeatedly 
add  to  K'  the  successors  of  state  sets  in  K',  where  for  s e V,  the  s- 

successor  of  Q e K'  is  {q'  ] q > q'  and  q e Q}.  We  use  ERSIGMA  to 

mean  the  thus  computed  transition  function  for  K' . Now  we  define  the  deci- 
sion function  PD(Q,h) , where  Q is  a state  in  K'  and  h e V,  in  terms  of  the 
states  Q.  Simply, 

PD(Q,h)  = M PD(q,h) 

Observe  that  the  computation  of  MOVES  in  FMA  is  just  this  seime  computation. 

Thus,  algorithm  FMA'  below  achieves  the  same  effect  as  FMA. 

algorithm  FMA' 
input  — as  in  FMA 
output  — as  in  FMA 
Init: 

let  Z be  the  stack  consisting  only  of  ? 

Push  ERSIGMA (?, First  R)  on  Z;  R Rest  R 
repeat 

let  h = First  R,  Q = Top  Z,  and  MOVES  = PD(Q,h) 

select  MOVES: 
case  { read } : 

Push  ERSIGMA (Q,h)  on  Z;  R ■«-  Rest  R 
case  {a  -*•  w) : # Reduce,  if  possible: 

if  |z|  > Iwl  then 

Pop  lw|  states  off  of  Z 
Push  ERSIGMA (Top  Z,A)  on  Z 
else  return  (Spelling  Z,R)  fi 


(cont. ) 
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case  { } # We  hit  an  error 

or  { accept}  # R is  J_ 
or  otherwise;  # |pd|  >1 
return  (Spelling  Z,R) 
end  repeat 

end  FMA' 

It  should  be  evident  that  FMA  and  FMA'  are  equivalent.  FMA'  is  as  fast 
as  the  LR  parsing  algorithm  save  the  check  in  case  {A  -*■  w}  that  jwj  states 
reside  on  the  stack. 

Now  that  FMA'  manipulates  states  rather  than  state  sets,  we  can  suggest 
a space  optimization.  Suppose  for  some  q e K,  {q}  e K'  (this  occurs  often) . 
If  q — ^ q'  is  a transition,  then  {q}  — ^ — > {q'l  is  also  a transition. 
Once  FMA'  pushes  a state  {q}  on  its  stack,  and  until  it  sometime  later  pops 
{q},  it  will  behave  as  if  it  had  pushed  state  q on  its  stack.  Thus  we  may 
"share"  state  {q}  in  K'  with  state  q in  K;  states  in  K'  having  transitions 
into  {q}  can  be  modified  to  instead  have  the  same  transitions  into  q.  Such 
sharing  reduces  the  storage  required  for  the  parser/error  recovery  package. 

The  following  state  sharing  criterion,  satisfied  by  (but  not  only  by) 
the  singleton  states  in  K' , determines  whether  state  sharing  may  occur:  For 
any  q e K,  Q e K' , Q may  share  with  q iff  for  every  y in  V,  if  y spells  a 
path  from  q to  q'  and  a path  from  Q to  Q'  then  PD(q',s)  = PD(Q',s)  for 
every  s e V.  In  other  words,  the  parsing  decisions  that  Q'  and  q'  make 
must  be  the  same.  States  in  K'  other  than  singleton  sets  satisfy  this  cri- 
terion. To  see  this,  let  t^  = {A  -*■  t-}  and  tj  = {A  t*,  B -*•  t-},  both  mem- 
bers of  K (A  -*■  t*  and  B -*■  t*  are  final  items , productions  whose  right  part 
has  been  recognized;  see  [ASJ  74)  or  [DeR  71]).  Let  {tp.tj}  e K'.  Note 
that  tjj  U tj  » tj.  Then  if  PD(tj,s)  » PD(  { tg , t j } , s)  for  every  s e V,  {tQ,tj} 
may  be  shared  with  tj.  This  is  the  same  as  requiring  that  the  look-ahead 
for  production  A -*■  t in  state  tg  be  a subset  of  the  look-ahead  for  production 
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A -►  t in  state  tj.  Non- singleton  states  that  can  be  shared  occur  in  prac- 
tice, but  they  are  non-trivial  to  determine.  Singleton  states  are  easy  to 
find  when  generating  K'.  Figure  2 shows  the  state  diagram  for  K'  with 
singleton  states  shared  with  states  in  the  state  diagram  of  Figure  1. 

Due  to  state  sharing,  the  percentage  of  extra  states  needed  over  the 
original  parser  is  only  about  20-50%,  and  the  percentage  of  extra  transi- 
tions about  39%-78%,  depending  upon  the  grammar.  For  Pascal  [Wir  73]  we 
need  increases  of  48%  and  78%,  respectively,  for  XPL  [MHW  70]  22%  and  39%, 
and  for  PAL  [A&U  72]  27%  and  40%.  With  restricted  restart  states  included, 
the  percentages  for  PASCAL  are  125%  and  426%,  for  XPL  80%  and  259%,  and  for 
PAL  74%  and  384%. 

The  significant  differences  between  our  approach  and  that  of  Druseikis 
eind  Ripley  [D&R  76,77]  is  that  they  compute  the  states  K'  via  the  LR(0)  con- 
structor algorithm,  using  actual  sets  of  LR  items  (see  [A&J  74])  and  they 
do  not  show  how  to  compute  the  look-ahead  sets  needed  by  FMA'  for  LALR(l) 
or  LR(1)  parsers.  Our  technique  works  equally  well  for  SLR(l) , LALR(l) , or 
any  other  LR-style  parser. 
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Fig.  2.  State  dlagr2un  representing  transitions  between  states  in  K' . 

Singleton  states  in  K’  have  been  shared  with  the  corresponding 
states  in  K.  Reductions  associated  with  ? have  been  omitted, 
since  FMA*  never  considers  them. 
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6.  CONCLUSIONS 


The  proof  of  an  error  recovery  algorithm  is  in  its  performance  in  a 
practical  environment,  quite  apart  from  any  nice  theoretical  properties 
it  might  have.  Druseikis  and  Ripley  [D&R  76]  were  kind  enough  to  share 
with  us  a tape  containing  erroneous  student  Pascal  programs.  We  ran  our 
preliminary  implementation  on  some  of  them,  given  a Pascal  grammar  deduced 
more-or-less  mechanically  from  the  Pascal  syntax  diagrcim  [Wir  73] . 

Each  repair  selected  by  the  algorithm  was  rated  "excellent"  if  it 
repaired  the  text  as  a human  reader  would  have,  "good"  if  not  but  it  still 
resulted  in  a reasonable  program  and  no  spurious  errors,  "poor"  if  it  re- 
sulted in  one  or  more  spurious  errors  (in  fact,  none  resulted  in  more  than 
one  spurious  error) , and  "unrepaired"  if  no  repair  was  selected  but  we 
continued  to  parse  via  FMA+  rather  than  the  parser.  The  results  follow: 


Excellent 

Good 

Poor 

Unrepaired 

Total 

32 

21 

9 

14 

76 

(42%) 

(28%) 

(12%) 

(18%) 

(100%) 

We  have  counted  spurious 

errors 

in  these 

statistics. 

Note  that  70%  were 

good  or  excellent.  With  some  tuning,  we  hope  to  reduce  the  poor  and  unre- 
paired responses  in  number.  The  unrepaired  cases  both  rob  us  of  upper- 
level  parsing  and  sometimes  adversely  affect  recovery  from  other  nearby 
errors.  We  have  no  idea  how  much  different  these  statistics  might  be  for 
more  "sophisticated"  errors  made  by  seasoned  system  programmers  intimately 
familiar  with  the  language. 

As  another  concrete  demonstration  of  the  algorithm's  performance,  we 
present  in  Figure  3 the  erroneous  Algol-like  sample  program  used  by  Graham  and 
Rhodes  to  illustrate  the  performance  of  their  error  recovery  algorithm 
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[GSR  75],  We  used  the  same  Algol  subset  grammar  as  they,  2md  our  repairs 
are  Identical  to  theirs,  without  the  need  for  a weighting  for  symbols  or 
a pattern  matching  algorithm.  We  used  the  algorithms  presented  in  Chapter 
5 except  that  we  modified  them  to  prefer  insertions  to  replacements  euid 
replacements  to  deletions.  We  rate  each  repair  as  "excellent,"  except  the 
insertion  of  the  <identifier>  between  * ^md  / on  line  5,  which  we  rate 
"good"  since  we  have  no  idea  what  the  human  might  have  done. 

We  should  note  that  backing  down  the  stack  infrequently  resulted  in  a 
good  repair,  except  in  the  critical  case  of  the  deleted  ^ in  the  progreim 
above.  Thus,  there  is  some  question  as  to  whether  that  technique  is  worth 
its  computational  cost;  it  should  at  least  be  delayed  until  no  more-produc- 
tive techniques  have  succeeded.  Clearly  more  research,  trials,  and  errors 
(no  pun  intended)  are  in  order.  As  yet  we  have  not  implemented  Super_FMA+. 


f feMia 
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2 iaifiatc  X*  j.  k,  l; 

• op:z«j>K*L«ii  ihsa  as  xi  tita  ■ i*  2; 

5 1 1,2  B ( 3 • ( l*j,  j • / K) 

B 11  X ■ 1 ilisA  iiica  as  is  op: 

7 L2t  fOfl 


IBISI  LlB«  2,  tokan  10,  anaipactad  *1" 

I91U2IS  Blockhaad  Bouada.list  Espcasslon  ..  "5" 

lQBJja£&  7 ••  Expcassion  ) 7 ; 7 Oeclacation  ; 7 labal.OaCinitioB 

lEPklB  *,*  ya8  insactad  «<tac  "S*  aad  bofoca  "I”. 

IISSI  Lina  4,  token  4.  unaxpactad 

CSLlfiXS  Olockbody  "I" 

CSEMIHB  7 Prinacy  7 Ralationop  Expc  7 then  7 as  (EBIOB) 
lUill  "il”  vas  insactad  after  Blockbody  and  before  "X". 

UBSl  Lina  4,  token  14,  anexpactad  "LI* 

IQligES  Blockbody  If_tkan_cl  30 
CQBBAES  7 £isg  7 "K“  (ESKOR) 

BEPklB  "Is"  vas  inserted  after  "as*  and  before  "LI". 

UBQS  Lina  4,  token  17,  unexpected  "Is" 

ISLLfiiiE  Blockbody  If_then  cl  Else  clause  "R" 

IQBSifiS  7 Prieary  7 ; 7 "A"  (ERROR) 

BtfilB  "Is"  was  replaced  with  after  *K"  and  before  "Prinary*. 

IBSfifi  Lina  5,  token  2,  unexpected  *1" 

IQllfiVS  Blockbody  "A” 

MEWBE  7 , 7 "2"  (ERROR) 
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Flgur*  3.  Run  of  error  recovexv  algorithm  on  program  of  Graham  and  Rhodes. 
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